Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-5216] Delete filtering optimization #5197

Merged
merged 2 commits into from
Jul 2, 2024
Merged

Conversation

myersCody
Copy link
Contributor

Jira Ticket

COST-5216

Description

This change will switch us to using head_object to download only the metadata during the delete filtering instead of the s3 object. This will save us some time since we will no longer need to spend cycles retrieving the additional attributes available to the object from s3 such as the ACL, owner, storage type, etc.

Testing

I wrote a little test script we can use for this:
https://gist.github.com/myersCody/0c57dc2132a7a712c035ae1210c30d20

Output:

TEST NOT MATCHING
------
2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/
Metadata: {}


------
2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/0d34ec6a-1cab-4c89-a03e-cb471c0ba067.csv
Metadata: {'test-key': '1'}


------
2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/0d34ec6a-1cab-4c89-a03e-cb471c0ba067.csv.metadata
Metadata: {}


['2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/', '2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/0d34ec6a-1cab-4c89-a03e-cb471c0ba067.csv.metadata']
TEST MATCHING
------
2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/
Metadata: {}


------
2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/0d34ec6a-1cab-4c89-a03e-cb471c0ba067.csv
Metadata: {'test-key': '1'}


------
2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/0d34ec6a-1cab-4c89-a03e-cb471c0ba067.csv.metadata
Metadata: {}


['2023/04/30/a678b047-f78e-4ad8-9fbe-1e3b73bd5a24/0d34ec6a-1cab-4c89-a03e-cb471c0ba067.csv']

Release Notes

  • proposed release note
* [COST-5216](https://issues.redhat.com/browse/COST-5216) Optimize delete filtering logic to only collect necessary metadata

@myersCody myersCody requested review from a team as code owners July 1, 2024 19:16
@myersCody myersCody added the aws-smoke-tests pr_check will build the image and run aws + ocp on aws smoke tests label Jul 1, 2024
Copy link

codecov bot commented Jul 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.2%. Comparing base (abd4261) to head (9240cc1).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #5197   +/-   ##
=====================================
  Coverage   94.2%   94.2%           
=====================================
  Files        376     376           
  Lines      31252   31254    +2     
  Branches    3734    3734           
=====================================
+ Hits       29426   29431    +5     
+ Misses      1163    1160    -3     
  Partials     663     663           

@lcouzens lcouzens merged commit 97ba98e into main Jul 2, 2024
11 checks passed
@lcouzens lcouzens deleted the COST-5216-head-object branch July 2, 2024 12:18
myersCody added a commit that referenced this pull request Jul 2, 2024
myersCody added a commit that referenced this pull request Jul 2, 2024
djnakabaale pushed a commit that referenced this pull request Jul 9, 2024
djnakabaale added a commit that referenced this pull request Jul 19, 2024
…5117)

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: insert new filters.

* feat: insert new filters and order by params.

* feat: customizing provider map and serializer.

* fix: changing TIME_CHOICES options.

* feat: first unit tests.

* feat: wip.

* feat: fixing provider map.

* feat: fixing provider map.

* update ec2 annotations to get all required fields

* use AWSEC2ComputeQueryParamSerializer

* feat: creating orderby and groupby serializers for ec2

* feat: unit tests for filters

* feat: removing group_by - not needed on ec2

* feat: fixing orderby serializer and starting units tests

* fix: typo

* feat: changing usage_hours to usage_amount

* feat: fixing unit test.

* wip: blocking some filters and unit test.

* feat: unit tests for group by filter

* flake8 fix

* feat: inserting more filters on validate function.

* feat: updating validate method to use similar logic and add filters.

* fix: changing unit tests for some filters.

* feat: testing filter combinations and flake8 checks.

* fix: test

* feat: serializer Unit tests and view Unit test fix

* flake8 fix

* fix: new approach to satisfy CodeCov

* fix:: getting rid of validate custom method.

* fix: commenting tags.

* fix: validarte functions, tests.

* handle filter params for specific report type

* transform tags to desired ui format

* default to monthly resolution on the EC2 endpoint

* add special pagination for EC2

* use default report type time period settings if exists

* fix typo

* fix: fixing parameters validations.

* [COST-5141] Fix management command to use continue instead of return. (#5173)

* [COST-5128] Process new subs tagging strategy to identify non-converted instances (#5162)

* [COST-4745] Add data_transfer_direction to OCP on GCP Trino tables (#5130)

* [COST-4741] Add data_transfer_direction for AWS network costs to Trino tables (#5129)

* [COST-5168] - Adding new penalty pipeline (#5176)

* [COST-5168] - Adding new penalty pipeline

* Improve our logging readability (#5178)

* add prometheus metrics for new queues (#5179)

* add v3.3.0 operator commits (#5143)

* [COST-5124] Improve Trino migration management command (#5163)

* Add exponential backoff and logging to retries
* Change log level to reflect severity
* Explicit SQL alias for clarity
* Catch and log exception instead of exiting
* Add return type hints
* Return if unsuccessful
  No point in verifying if the SQL did not run correctly
* Fine tune exponential backoff
* Create action class for adding/verifying columns were added
* Assign default list using default_fatory
  Instead of doing it in the post_init, which get’s a little weird.
* Add drop column action
* Quote items in logs for better legibility
* Consolidate action classes
  We lose some of the action-specific logging messages, but there is less
  code overall. I’m not sure how this scale to the action related to dropping.
* Change local variable name
  No need to add a prefix to differentiate it from the parameter name.
* Use a set to prevent running on the same schema multiple times

Co-authored-by: Cody Myers <[email protected]>

* Filter accounts by matching criteria during subs processing to prevent unnecessary SQL from running (#5184)

* Update tasks.py (#5185)

* clean up grafana dashboard (#5183)

* Skip OCPCloud tag SQL if key is present in cache but value is None (#5186)

* [COST-5196] - Send OCP tasks to correct queues (#5187)

* [COST-5196] - Send OCP tasks to correct queues

* [COST-5176] correctly pass context dictionary within log_json function call (#5182)

* [COST-5176] correctly pass context dictionary within log_json function call

* add unittests for exceptions in generate_report

* batch delete S3 files (#5180)

* Bump urllib3 from 1.26.18 to 1.26.19 in the pip group across 1 directory (#5172)

Bumps the pip group with 1 update in the / directory: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 1.26.18 to 1.26.19
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.18...1.26.19)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add flower as a dev dependency (#5189)

* Add docs

* [COST-4844] Serializer update for ordering by storageclass (#5174)

* Switch to using podman in build_deploy (#5193)

The VM used in CI is now RHEL 8

* skip polling providers still processing (#5181)

* skip polling providers that are still processing

* [COST-5214] Move TARGETARCH declaration to the top of the Dockerfile (#5195)

There is a bug in podman where this is only used correctly for the
multi-stage build if it is defined as the first line.

Update Jenkinsfile to use RHEL 8

Unfortunately this breaks the image build for Docker. I'll fix that in a followup PR.

* [COST-5213] - fix S3 prepare (#5194)

* Switch default parquet flag to prevent iterating on all files in each worker when there is nothing to delete

* [COST-5214] pass build-arg to docker build command (#5196)

* [COST-5216] Delete filtering optimization (#5197)

* Revert "[COST-5216] Delete filtering optimization (#5197)" (#5200)

This reverts commit 97ba98e.

* [COST-5226] - Skip S3 delete (daily flow) if we have marked deletion complete. (#5198)

* dont attempt more S3 deletes if we have marked deletion complete

* [COST-5076] upgrade to python 3.11 (#4444)

* upgrade to python 3.11

* pipfile update

* add gcc-c++ compiler

Co-authored-by: Sam Doran <[email protected]>

* update test

* replace gcc with gcc-c++

---------

Co-authored-by: Sam Doran <[email protected]>

* [COST-5228] log outside for loop (#5202)

* [COST-5228] log outside for loop

* additional log clean up

* add context to logs in _remove_expired_data func

* log s3 batch deletes (#5204)

* log s3 batch deletes

* [COST-5219] Correctly report VM usage for metering when billing record is split (#5201)

* [COST-5219] Handle Azure instance record being split

* [COST-4745] OCPGCP Network data processing SQL (#5058)

* [COST-4745] OCPGCP Network data processing SQL

---------

Co-authored-by: Sam Doran <[email protected]>

* [COST-5198] - split read traffic to read replica db using nginx proxy (#5188)

* update nginx with HTTP method routing
* switch koku-api to koku-api-writes
* duplicate koku-api to koku-api-reads add a optional mounted secret for the read replica
* update clowder configurator to read from read replica secret if mounted and enabled via ENV var

* remove unused methods (#5208)

* Bump certifi in the pip group across 1 directory (#5207)

* chore(image): update and rebuild image (#5203)

Co-authored-by: Update-a-Bot <[email protected]>

* Handle case when resource ID cannot be obtained (#5209)

* Catch exception case.

* [COST-5148] filter out empty resource ids and SavingsPlanCoveredUsage entries (#5206)

* [COST-5148] update insert sql
filter out empty resource ids
offset savings from SavingsPlanCoveredUsage

* closing CASE statement

* clean up comment

* remove case stmts in favor of filtering out SavingsPlanCoveredUsage

* clean up

* Unpause the csi volume handle sql (#5175)

* update linting

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: insert new filters.

* feat: insert new filters and order by params.

* feat: customizing provider map and serializer.

* fix: changing TIME_CHOICES options.

* feat: first unit tests.

* feat: wip.

* feat: fixing provider map.

* feat: fixing provider map.

* update ec2 annotations to get all required fields

* use AWSEC2ComputeQueryParamSerializer

* feat: creating orderby and groupby serializers for ec2

* feat: unit tests for filters

* feat: removing group_by - not needed on ec2

* feat: fixing orderby serializer and starting units tests

* fix: typo

* feat: changing usage_hours to usage_amount

* feat: fixing unit test.

* wip: blocking some filters and unit test.

* feat: unit tests for group by filter

* flake8 fix

* feat: inserting more filters on validate function.

* feat: updating validate method to use similar logic and add filters.

* fix: changing unit tests for some filters.

* feat: testing filter combinations and flake8 checks.

* fix: test

* feat: serializer Unit tests and view Unit test fix

* flake8 fix

* fix: new approach to satisfy CodeCov

* fix:: getting rid of validate custom method.

* fix: commenting tags.

* fix: validarte functions, tests.

* handle filter params for specific report type

* transform tags to desired ui format

* default to monthly resolution on the EC2 endpoint

* add special pagination for EC2

* use default report type time period settings if exists

* fix typo

* fix: fixing parameters validations.

* update linting

* squash commits

* clean up query params and time period settings

* do not use filter keyword

* more code clean up
update unit tests

* address feedback
- move report_specific filters to main filter map
- use report_type instead of kwargs in get_paginator
- do not use deepcopy - just overwrite query_data
- resolution and time_scope_units are always monthly and month respectively
- overide start and end date params in base ParamSerializer
- overide limit and offset in base FilterSerializer

* more unit tests

* update openapi spec

* clean up and add unit tests

* move changes to openapi spec to a separate pr

* use serializer choice field and not customer validate method

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: David N <[email protected]>
Co-authored-by: Luke Couzens <[email protected]>
Co-authored-by: Cody Myers <[email protected]>
Co-authored-by: Corey Goodfred <[email protected]>
Co-authored-by: Sam Doran <[email protected]>
Co-authored-by: Michael Skarbek <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Hambridge <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Update-a-Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-smoke-tests pr_check will build the image and run aws + ocp on aws smoke tests smokes-required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants